model assessment
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Probabilistic Models for Integration Error in the Assessment of Functional Cardiac Models
Chris Oates, Steven Niederer, Angela Lee, François-Xavier Briol, Mark Girolami
This paper studies the numerical computation of integrals, representing estimates or predictions, over the output f(x) of a computational model with respect to a distribution p(dx) over uncertain inputs x to the model. For the functional cardiac models that motivate this work, neither f nor p possess a closed-form expression and evaluation of either requires 100 CPU hours, precluding standard numerical integration methods. Our proposal is to treat integration as an estimation problem, with a joint model for both the a priori unknown function f and the a priori unknown distribution p. The result is a posterior distribution over the integral that explicitly accounts for dual sources of numerical approximation error due to a severely limited computational budget. This construction is applied to account, in a statistically principled manner, for the impact of numerical errors that (at present) are confounding factors in functional cardiac model assessment.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Model Assessment and Selection under Temporal Distribution Shift
Han, Elise, Huang, Chengpiao, Wang, Kaizheng
Statistical learning theory is traditionally founded on the assumption of a static data distribution, where statistical models are trained and deployed in the same environment. However, this assumption is often violated in practice, where the data distribution keeps changing over time. The temporal distribution shift can lead to serious decline in model performance post-deployment, which underlines the critical need to monitor models and detect potential degradation. Moreover, one often needs to choose among multiple candidate models originating from different learning algorithms (e.g., linear regression, random forests, neural networks) and hyperparameters (e.g., penalty parameter, step size, time window for training). Temporal distribution shift poses a major challenge to model selection, as past performance may not reliably predict future outcomes. Learners usually have to work with limited data from the current time period and abundant historical data, whose distributions may vary significantly.
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
On Leakage in Machine Learning Pipelines
Sasse, Leonard, Nicolaisen-Sobesky, Eliana, Dukart, Juergen, Eickhoff, Simon B., Götz, Michael, Hamdan, Sami, Komeyer, Vera, Kulkarni, Abhijit, Lahnakoski, Juha, Love, Bradley C., Raimondo, Federico, Patil, Kaustubh R.
Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- (5 more...)
- Research Report (1.00)
- Overview (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Spatial machine-learning model diagnostics: a model-agnostic distance-based approach
While significant progress has been made towards explaining black-box machine-learning (ML) models, there is still a distinct lack of diagnostic tools that elucidate the spatial behaviour of ML models in terms of predictive skill and variable importance. This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools for spatial prediction models with a focus on prediction distance. Their suitability is demonstrated in two case studies representing a regionalization task in an environmental-science context, and a classification task from remotely-sensed land cover classification. In these case studies, the SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences but also relevant similarities. Limitations of related cross-validation techniques are outlined, and the case is made that modelers should focus their model assessment and interpretation on the intended spatial prediction horizon. The range of autocorrelation, in contrast, is not a suitable criterion for defining spatial cross-validation test sets. The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
Mlr3spatiotempcv: Spatiotemporal resampling methods for machine learning in R
Schratz, Patrick, Becker, Marc, Lang, Michel, Brenning, Alexander
Spatial and spatiotemporal prediction tasks are common in applications ranging from environmental sciences to archaeology and epidemiology. While sophisticated mathematical frameworks have long been developed in spatial statistics to characterize predictive uncertainties under well-defined mathematical assumptions such as intrinsic stationarity (e.g., Cressie 1993), computational estimation procedures have only been proposed more recently to assess predictive performances of spatial and spatiotemporal prediction models (Brenning 2005, 2012; Pohjankukka, Pahikkala, Nevalainen, and Heikkonen 2017; Roberts, Bahn, Ciuti, Boyce, Elith, Guillera-Arroita, Hauenstein, Lahoz-Monfort, Schröder, Thuiller, Warton, Wintle, Hartig, and Dormann 2017). Although alternatives such as the bootstrap exist since some decades (Efron and Gong 1983; Hand 1997), cross-validation (CV) is a particularly well-established, easy-to-implement algorithm for model assessment of supervised machine-learning models (Efron and Gong 1983, and next section) and model selection (Arlot and Celisse 2010). In its basic form, CV is based on resampling the data without paying attention to any possible dependence structure, which may arise from, e.g., grouped or structured data, or underlying environmental processes inducing some sort of spatial coherence at the landscape scale. In treating dependent observations as independent, or ignoring autocorrelation, CV test samples may in fact be heavily correlated with, or even pseudo-replicates of, the data used for training the model, which introduces a potentially severe bias in assessing the transferability of flexible machine-learning (ML) models.
- North America > United States > New York (0.04)
- Europe > Spain > Aragón (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- (4 more...)
- Energy (0.46)
- Food & Agriculture > Agriculture (0.46)
Approximate Cross-validation: Guarantees for Model Assessment and Selection
Wilson, Ashia, Kasy, Maximilian, Mackey, Lester
Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. Recent work in empirical risk minimization (ERM) approximates the expensive refitting with a single Newton step warm-started from the full training set optimizer. While this can greatly reduce runtime, several open questions remain including whether these approximations lead to faithful model selection and whether they are suitable for non-smooth objectives. We address these questions with three main contributions: (i) we provide uniform non-asymptotic, deterministic model assessment guarantees for approximate CV; (ii) we show that (roughly) the same conditions also guarantee model selection performance comparable to CV; (iii) we provide a proximal Newton extension of the approximate CV framework for non-smooth prediction problems and develop improved assessment guarantees for problems such as l1-regularized ERM.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
Structural modeling using overlapped group penalties for discovering predictive biomarkers for subgroup analysis
Ma, Chong, Deng, Wenxuan, Ma, Shuangge, Liu, Ray, Galinsky, Kevin
The identification of predictive biomarkers from a large scale of covariates for subgroup analysis has attracted fundamental attention in medical research. In this article, we propose a generalized penalized regression method with a novel penalty function, for enforcing the hierarchy structure between the prognostic and predictive effects, such that a nonzero predictive effect must induce its ancestor prognostic effects being nonzero in the model. Our method is able to select useful predictive biomarkers by yielding a sparse, interpretable, and predictable model for subgroup analysis, and can deal with different types of response variable such as continuous, categorical, and time-to-event data. We show that our method is asymptotically consistent under some regularized conditions. To minimize the generalized penalized regression model, we propose a novel integrative optimization algorithm by integrating the majorization-minimization and the alternating direction method of multipliers, which is named after \texttt{smog}. The enriched simulation study and real case study demonstrate that our method is very powerful for discovering the true predictive biomarkers and identifying subgroups of patients.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Probabilistic Models for Integration Error in the Assessment of Functional Cardiac Models
Oates, Chris, Niederer, Steven, Lee, Angela, Briol, François-Xavier, Girolami, Mark
This paper studies the numerical computation of integrals, representing estimates or predictions, over the output $f(x)$ of a computational model with respect to a distribution $p(\mathrm{d}x)$ over uncertain inputs $x$ to the model. For the functional cardiac models that motivate this work, neither $f$ nor $p$ possess a closed-form expression and evaluation of either requires $\approx$ 100 CPU hours, precluding standard numerical integration methods. Our proposal is to treat integration as an estimation problem, with a joint model for both the a priori unknown function $f$ and the a priori unknown distribution $p$. The result is a posterior distribution over the integral that explicitly accounts for dual sources of numerical approximation error due to a severely limited computational budget. This construction is applied to account, in a statistically principled manner, for the impact of numerical errors that (at present) are confounding factors in functional cardiac model assessment.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)